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Summary 

This  paper  concerns  methods  to  construct  approximate  confidence  hmits  for  a  scalar 
parameter  ^  in  the  presence  of  nuisance  parameters.  The  methods  are  based  on  Bayesian 
procedures  discussed  by  Peers  (1965)  and  Stein  (1985),  in  which  the  prior  density  is  chosen 
so  that  the  posterior  quantiles  of  rj)  are  approximate  confidence  hmits  with  coverage  error 
of  order  0(n“* )  under  repeated  sampHng.  Multidimensional  integration  of  the  posterior 
density  is  avoided  by  using  approximations  of  marginal  densities  and  distribution  functions; 
thus,  adjustments  are  obtained  that  improve  the  standard  normal  approximation  to  the 
distributions  of  signed  roots  of  the  profile  and  conditional  hkeUhood  ratio  statistics  for 
xj}.  The  necessary  prior  densities  are  easy  to  specify  when  the  nuisance  parameters  are 
orthogonal  to  the  parameter  of  interest,  and  this  simphcity  is  exploited  in  developing  the 
methods.  However,  the  need  for  exphcit  specification  of  an  orthogoned  parameterization 
is  alleviated  by  approximating  the  Jacobian  of  a  transformation  to  orthogonality.  The 
methods  are  illustrated  and  compared  with  other  procedures  in  some  examples  involving 
exponential  families. 

Some  key  words:  Asymptotic  normality;  Conditional  profile  likelihood;  Confidence  limit; 
Exponential  family;  Gamma  distribution;  Marginal  density  approximation;  Noninforma- 
tive  prior;  Nuisance  parameter;  Orthogonal  parameterization;  Profile  likelihood;  Signed 
root  likelihood  ratio  statistic;  Tail  probability  approximation. 


1.  Introduction 


Consider  observed  random  variables  Xi , . . . ,  whose  joint  distribution  depends  on 
a  d-dimensional  parameter  ^  and  suppose  that  inference  about  a  scalar 

parameter  V’  =  V’(^)  is  of  interest.  Assume  that  the  log  likelihood  function  /(<^)  attains 
its  global  maximum  at  ^  . . . ,  and  that  the  constrained  maximum  is  attained  at 

^(V’)  =  (r  » •  •  •  >  for  fixed  V’-  Then  the  maximum  likelihood  estimator  of  rp  is 
and  Expressed  in  terms  of  the  profile  likelihood  function  lp{tp)  =  l{4{ip)},  the 

likelihood  ratio  statistic  for  testing  tp  =  tpo  is  W(ipo)  =  2{/p(^)  —  lp(tpo)},  and  the  signed 
root  of  the  likelihood  ratio  statistic  is  R^ipo)  ==  sgn(r^  —  Under  the  null 

hypothesis,  the  standard  normsd  approximation  to  the  conditional  distribution  of  R{rpo) 
typically  has  error  of  order  0(n“^/^),  where  the  conditioning  is  on  an  exact  or  approximate 
ancillary  statistic  (McCvillagh  (1984),  BamdorfF-Nielsen  (1986)).  Consequently,  the  value 
of  tp  that  satisfies  ${/2(V»)}  =  a  is  an  approximate  upper  1  —  a  confidence  limit  having 
coverage  error  of  order  0(n"^^^),  both  conditionally  and  unconditionally.  The  primary 
goal  of  this  paper  is  to  develop  related  methods  for  constructing  approximate  confidence 
limits  that  attain  higher  coverage  acctiracy. 

Bayesian  procedures  are  available  to  construct  improved  confidence  limits.  Various 
authors  have  considered  how  to  choose  a  prior  density  ir(^)  so  that,  for  each  q,  the  posterior 
1  —  a  quantile  of  rp  is  an  upper  confidence  limit  for  the  parameter  xp  with  coverage  1  —  q  + 
0(n~^)  in  the  repeated  sampling  sense.  When  there  are  no  nuisance  parameters,  Welch 
and  Peers  (1963)  showed  that  the  prior  density  should  be  chosen  proportional  to  the  square 
root  of  the  expected  information  for  xp.  When  nuisance  parameters  are  present,  there  is 
considerable  arbitrariness  in  the  choice  of  prior  density,  and  Peers  (1965)  and  Stein  (1985) 
developed  differential  equations  for  whose  solutions  yield  limits  having  coverage  error 
of  order  0(n~^ ).  Unfortunately,  two  difficulties  often  arise  in  implementing  the  Bayesian 
methods.  Exact  calculation  of  the  posterior  quantiles  of  xp  usually  requires  numerical 
integration,  which  can  be  cumbersome.  Moreover,  for  a  convenient  parameterization  (p. 


/  Co4o« 
id/or 


□  □ 


solutions  of  the  Peers  and  Stein  differential  equations  can  be  difficult  to  find. 

For  Bayesian  inference  using  a  prior  density  7r(^)  for  <f>,  approximations  to  the  pos¬ 
terior  density  and  distribution  function  of  ^  have  been  developed  that  avoid  numericeil 
integration.  To  describe  these  approximations,  some  additional  notation  is  necessary.  Let 
Iii4>)  =  =  df^li<f>)fd<(>*d<i>i,  M<f>)  =  dxl>{<l>)fd<f>\ 

(t,  j  =  1, . . .  ,d),  and  assxime  the  gradient  of  is  nonzero.  A  Lagrange- multiplier  argu¬ 
ment  shows  there  exists  a  constant  r(^)  such  that  t(V>)  =  for  all  i  satisfying 

^  0.  One  value  of  the  index  i  having  this  property  always  exists  by  assumption,  eind 
hence  t(V>)  =  Set 

=  -UjW  +  (x,i  =  l,...,d). 


Then  since  r(t;A)  =  0.  Define 


W)  = 


1 

rr(^) 


det{Ji>(^)} 

dei  {Iij{i>)}  \ 


1/2 


(1) 


where  Q(V»)  =  is  the  d  x  d  matrix  inverse  of  and 

the  standard  summation  convention  is  used.  The  Laplace  approximation  to  the  marginal 
posterior  density  of  V*  given  by  Tierney,  Kass  and  Kadane  (1989)  is 


U\xW  ^  c  T(V»)r(V»)exp{/p(^)  -  Ipirk)},  (2) 


where  X  =  ( Xj  ,...,Xn)  and  c  is  a  normalizing  consteint.  Approximations  to  the  posterior 
distribution  fimction  of  ip  that  follow  from  DiCiccio  and  Martin  (1991,  1992)  are 


1  -  F^\x(tp)  ^  ^(R)  +  V>{R)(R~^  -T),  1  -  F^\xitp)  ^  ^R  -  R~'  logiRT)},  (3) 

where  R  =  R{ip),  T  =  T{ip),  and  $  and  are  the  standard  normal  distribution  function 
and  density,  respectively.  For  arguments  ip  such  that  ip  — ip  is  Op(n”^/^),  the  relative  error 
of  approximation  (2)  is  and  the  errors  of  approximations  (3)  are  also  0(n“^/^). 
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It  follows  that  if  a  prior  density  of  the  Peers  and  Stein  type  is  assumed,  then  the  1  —  q 
quantile  of  approximation  (2)  and  the  solutions  of  the  equations 


$(i2)  +  ipiR)(R-^  -  T)  =  a,  -  R-^  log(iir)}  =  a,  (4) 

are  upper  confidence  limits  for  the  parameter  ti>  having  coverage  1  —  a  +  0(n“^ ).  In  some 
cases,  there  exists  a  prior  density  for  which  the  posterior  quantiles  of  rp  are  approximate 
confidence  limits  with  coverage  error  of  order  or  smaller.  When  such  a  prior 

density  is  used,  the  confidence  limits  derived  from  approximations  (2)  and  (3)  have  coverage 
1  —  a  + 


Note  that  in  the  important  special  case  ip  = 


‘r(V')  = 


det{I<j(^)} 


det{-/ij(^)} 


Q(ip)  dei{Iij(ip)}  ’ 


where  {— is  the  (d  —  1)  x  (d  —  1)  submatrix  of  {— corresponding  to  the 
nuisance  parameters  4^,...,  <p^.  Generally,  T  ~  R~^  -h  Op(n~^^^)  for  values  of  ip  such  that 
Ip  -  Ip  is  so  R~^  —  T  and  R~^  log(f2I')  are  both  Op(n“^/^)  in  (3)  and  (4). 

Although  the  Peers  and  Stein  differential  equations  can  be  difficult  to  solve  in  an  arbi¬ 
trary  parameterization  <p,  the  equations  simplify  considerably  when  orthogonal  parameters 
are  used.  Tibshirani  (1989)  noted  that  if  the  parameter  of  interest  is  ip{(p)  =  and  the 
nuisance  parameters  , . . . ,  <f>'^  are  orthogonal  to  4"^  in  the  sense  discussed  by  Cox  and 
Reid  (1987),  the  Peers  and  Stein  equations  reduce  to 

{*n(^)}"^^^^  log7r(^)  +  ^{*n(^)}"^^^  =  0, 
which  has  solutions  of  the  form 


where  tn(<A)  =  E{—lui4>)}  •  •  • » arbitrary  positive  function  of  the  nui¬ 

sance  peurameters. 
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A  key  property  of  orthogonal  parameterizations  is  that  the  differences 
(*'  =  2,...,d)  between  the  constrained  and  overall  meiximum  likelihood  estimators  of 
the  nuisance  parameters  are  Op(n~^)  for  values  of  such  that  is 

Consequently,  for  these  values  of  V’  = 


+Op(n  ^), 


(5) 


and 


1  irWrdet(-/.y(^)}1‘‘'^_  1  [•■■■Wdet(-/oW)1‘/^  . 

The  approximation  in  (6)  is  exact  if  the  function  , . . . ,  4>‘^)  is  constant.  When  this 
approximation  is  used  in  place  of  T,  confidence  limits  obtained  through  (2)  and  (3)  retain 
coverage  error  of  order  0(n~^).  If  there  are  no  nuisance  peirameters  are  present  and  rl>  =  (p, 
then 

where  /<*)(^)  =  {k  =  1,2)  and  i(<^)  = 

DiCiccio  and  Martin  (1992)  considered  use  of  approximation  (6)  in  equations  (4)  to 
construct  confidence  limits.  By  comparison  of  this  method  with  related  procedures  of 
Bamdorff-Nielsen  (1986,1991)  that  improve  the  standard  normal  approximation  to  the 
conditional  distribution  of  the  signed  root  of  the  likelihood  ratio  statistic,  they  showed  the 
approximate  confidence  limits  derived  using  the  Peers  and  Stein  priors  have  coverage  error 
of  order  0{n~^)  conditionally  as  well  as  imconditionzdly.  Indeed,  the  limits  obtained  from 
the  Bayesian  approach  differ  from  Bamdorff-Nielsen’s  limits  by  terms  of  order  Op{n~^^'^). 
Although  confidence  limits  produced  by  Barndorff-Nielsen’s  approximations  have  condi¬ 
tioned  coverage  error  of  order  0(n"’/^),  his  procedures  generally  require  specification  of 
statistics  that  are  exactly  or  approximately  ancillary.  Ancillary  statistics  are  not  necessary 
for  approximation  (6).  However,  an  obstacle  to  the  straightforward  use  of  (6)  is  that  it 
requires  orthogonal  parameters,  and  orthogonal  pEu-ameterizations  are  often  inconvenient 
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or  difficvilt  to  find  in  practice,  though  they  always  exist.  An  approximation  to  T  that  does 
not  directly  involve  such  special  paurameterizations  is  developed  in  Section  2  of  the  present 
paper.  The  derivation  of  this  approximation  exploits  the  simple  form  of  the  solutions 
to  the  Peers  and  Stein  equations  in  the  orthogonal  case;  however,  the  need  for  explicit 
knowledge  of  an  orthogonal  parameterization  is  avoided  by  approximating  the  JacobiEin  of 
a  transformation  to  orthogonality. 

The  use  of  equations  (4)  in  the  absence  of  nuisance  parameters  has  also  been  considered 
by  BamdorfF-Nielsen  and  Chamberlain  (1992). 

Section  3  concerns  the  construction  of  improved  confidence  limits  by  methods  similar 
to  (3)  that  involve  the  signed  root  of  the  Cox  and  Reid  (1987)  conditional  likelihood  ratio 
statistic.  For  certaun  situations,  particularly  when  the  number  of  nuisance  paraimeters  is 
large,  the  standard  normal  approximation  to  the  distribution  of  the  signed  root  of  the 
profile  likelihood  ratio  statistic  can  be  extremely  poor,  and  approximate  confidence  limits 
obtained  by  solving  equations  (4)  can  have  true  coverage  far  from  the  nominal  levels.  In 
these  cases,  the  distribution  of  the  signed  root  of  the  conditional  likelihood  ratio  statistic 
tends  to  be  closer  to  the  standard  normal,  and  solving  the  equations  analogous  to  (4) 
that  are  derived  in  Section  3  tends  to  produce  approximate  confidence  limits  with  more 
accurate  coverage.  Although  the  methods  developed  in  Section  3  offer  better  coverage 
accuracy,  they  are  also  computationally  more  diffic\ilt  to  implement  in  general.  Cox  and 
Reid  (1987)  defined  their  conditional  likelihood  ratio  statistic  in  terms  of  a  conditional 
profile  likelihood  fimction  that  requires  orthogonal  parameters.  Section  3  contains  an 
approximation  to  this  function  having  error  of  order  Op(n“* )  that  can  be  computed  in  any 
parameterization. 

Some  examples  involving  exponential  families  are  considered  in  Section  4. 

2.  Approximation  of  T 

Consider  a  reparameterization  A(^)  =  (A*,...,  A**)  such  that  A’  =  xL'  is  the  scalar 
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parameter  of  interest  and  the  nuisance  parameters  ,  A**  are  orthogonal  to  A^  Sup¬ 

pose  that  a  prior  density  7r(A)  is  assumed  for  the  orthogonal  pzu’Jimeterization  A.  The 
corresponding  prior  density  for  the  original  parameterization  4>  is 

■K{<i>)  =  t{A(<A)}ldet{A“(0)}|  =  #{A(<^)}|detK{A(<;i)}]r\ 

where  A^((^)  =  5A“(<^)/5<^‘  and  <^a(A)  =  5(^‘(A)/9A“  (a,i  =  1,. ..  ,ti),  so  [<^UA((;/>)}]  is  the 
d  X  d  matrix  inverse  of  {A“((^)}.  With  this  prior  density  for  expression  (1)  for  T  becomes 

1  #(A)r  det{/^(,^)} 

7r(A)  [Q(V’)det{Ji_,(V>)}  J  det{«/>«,(A)}  ’ 

since  A  =  A(^)  and  A  =  A(^).  Thus,  to  approximate  T  requires  knowledge  of  the  ratios 
7r(A)/7f(A)  iind  det{^J,(A)}/ det{<^a(A)}.  Approximations  to  these  ratios  Eire  given  in  for¬ 
mulae  (10)  smd  (14),  respectively.  The  approximation  to  7f(A)/7r(A)  applies  in  the  case 
that  7f(A)  is  a  solution  to  the  Peers  and  Stein  differential  equations. 

Denote  the  log  likelihood  function  for  A  by  /(A),  let  /ai(A)  =  5^/(A)/5A“5A*’.  and 
define  =  E{-Uj{<f>)},  *o6(A)  =  £^{-ra6(A)},  {a,b,i,j  =  l,...,d).  Differentiating  the 

identity  /(A)  =  /{^(A)}  twice  and  taking  expectations  yields 

tabW  =  io{<^(A)}^i(A)^>(A)  (a,6=  l,...,d), 

and  hence, 

?‘“(\)  =  t‘>{,^(A)}A“{<6(A)}A;{9i(A)}  (a,6  =  l,...,d),  (8) 

where  {f“*(A)}  and  {i*^{<l>)}  are  the  d  x  d  matrix  inverses  of  {*o6(A)}  and  {zi>(<;^)},  respec¬ 
tively.  In  particular,  for  a  =  6  =  1,  (8)  becomes 

?i(A)  =  PM^(A)}^i{.^(A)}V>>{<^(A)}.  (9) 

By  definition,  orthogonality  of  A^, . . . ,  A**  to  A*  =  ^  mesuis  that  t’a'i  (A)  =  =  0 

(o'  =  2,  ...,d).  Two  important  consequences  of  orthogonality  sire  t“'^(A)  =  (A)  =  0 

(o'  =  2, . . .  ,d)  and  t^^(A)  =  {iu(<^)}”^  • 
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Suppose  that  if  (A)  is  a  prior  of  the  Peers  and  Stein  type.  Since  ^  =  (/»(A)  and  0  = 
^(A),  it  follows  from  (5)  and  (9)  that  the  ratio  #(A)/if(A)  can  be  approximated  in  the  <}> 
parameterization  by 


7^ 

if(A) 


-1 


UiiWJ 


) 


+  0„(n-') 


(10) 


for  values  of  the  parameter  V*  such  that  rjf  —  is  Op(n 


It  follows  from  (8)  that 


i“'(AK(A)  =  i‘M^(A)}A“{^{A)}  (a,i  =  1 . d),  (11) 


and  setting  a  =  1  in  (11)  gives 

(*  =  1 . d).  (12) 

Combining  (9)  and  (12)  shows  that  for  an  orthogonal  reparamterization  A(<^)  with  A*  =  V' 
the  parameter  of  interest,  ^j(A)  =  5^*(A)/5A^  =  r‘{^(A)}  (i  =  1, . . .  ,d),  where 

The  converse  of  this  resxilt  also  holds:  if  a  reparameterization  X{<f>)  satisfies  <i>\{X)  = 
r‘{<;i(A)}  (i  =  l,.,.,d),  then  A^,...,A‘*  are  orthogonal  to  A^  =  V’-  When  ip  =  (l>^  is 
the  scalzir  parameter  of  interest,  r'(^)  =  i*^{<p)/i^^{(p)  (i  =  1, . . . ,  d).  Then  the  conditions 
<^j(A)  =  r*{(^(A)}  (i  =  l,...,d)  for  orthogonality  are  equivalent  to  *i'>{^(A)}<^j(A)  =  0 
(i'  =  2, . . . ,  d),  or 

d 

Y,  ii'fimHiw  =  {>'  =  2, . . . ,rf), 

>'=2 

which  are  the  orthogonality  equations  derived  by  Cox  and  Reid  (1987). 
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Now  differentiation  of  the  identities  <^i(A)  =  r’{<^(A)}  (z  =  1, . . .  ,d)  with  respect  to 
A®  produces  the  system  of  equations 

=  r;{^(A)}<^i(A)  (a,i  = 

where  r’(^)  =  Theorem  7.3  of  Coddington  and  Levinson  (1955,  p.28)  shows 

that  the  solutions  <f>a{X)  (a,  z  =  1, . . . ,  d)  to  this  system  have  the  property 

det{<^i(A^A^...,A‘')}=det{^i(<T,A^...,A‘')}exp[^^  tv  . .  ,X‘^)}ds  ,  (13) 

where  <7  is  an  arbitrary  constant  and  r((^)  =  {rj(^)}  is  a.  d  x  d  matrix.  By  choosing 
=  A^(A^), . . . ,  A**  =  A‘*(A^)  and  <r  =  A^,  the  left-hand  side  of  (13)  is 

det[«{A',  a2{A*),  . . . ,  A^A'))]  =  det{^i(A)}, 

while  the  factors  on  the  right-hand  side  are 

det[^i*  {A^  A2(A^ ),...,  X^X^)}]  =  detmX)}  -f  0,{n-^ ), 


exp^^  tr  r[<^{5,P(A' ),...,  A‘'(A^)}]d5)  tr  r[(^{A(s)}]ds^  -H  Op(n-> ) 

=  exp^y^  trr{^(s)}ds  -|-Op(n~^), 

for  values  of  A^  such  that  A^  —  A^  is  Op{n~^f^).  Therefore,  the  ratio  det{(ji>^(A)}/  det{«^J,(A)} 
can  be  approximated  in  the  <j>  parameterization  by 

det{^J,(A)}  [Jti, 

for  values  of  the  parameter  V*  such  that  ip  —  tpis  Op{n~^f^).  Since  the  ratio  is  1  -f  Op(n~^/' ). 
other  approximations  having  error  of  order  Op(n~^)  are 

exp{(^  -  ^)  trr(<^)},  exp{(0  -  trr(0)}. 
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The  advantage  of  these  approximations  over  (14)  is  computational  simplicity. 

In  summary,  the  approximation  to  T(^)  that  emerges  from  (7),  (10)  and  (14)  is 

=  1  r  det{Iij{ip)} 

t(V»)  1  i‘>(^)V'i(^)V’i(^)  /  L  W)det{/o(V’)}. 

xexp|-y  trr{<^(s)}ds  +Op{n~^),  (15) 

and  the  same  order  of  error  holds  if  the  integral  is  replaced  by  either  (^  —  r/>)  trr((p)or(?/>  — 
rp)  tr  r(^).  If  approximation  (15)  is  used  for  T,  then  the  confidence  limits  obtained  *^hrough 
(2)  and  (3)  have  coverage  error  of  order  0(n~^),  both  conditionally  and  unconditionally. 
When  the  parameterization  p  is  orthogonal,  tr  r(^)  =  0  and  (15)  reduces  to  approximation 


An  alternative  formula  for  tr  r(^)  is  now  developed  that  may  be  useful  in  computing 
(15).  Differentiation  of  the  identities  r'(^)  =  i*^{<l>)Kj{<p)  (i  =  1, . . .  ,d)  with  respect  to  (p' 


yields 


riw  =  . ,d), 


where  '‘i.lW  =  ’it.M)  =  dijt{ip)/d<l>‘. 

and  the  result  is  used,  (i,j,  k,l  =  1, ,  d).  Thus, 

tr  r(^)  =  (16, 

When  rp  =  <f>^  is  the  scalar  parameter  of  interest,  Kj(<p)  =  {j  =  1, . . .  ,c?),  where 

Sj  is  Kronecker’s  delta,  and  (16)  yields 


tr  T{<p)  =  (ni 

A  desirable  property  of  any  procedure  for  inference  about  tp  is  that  it  not  depend  on  the 
particular  choice  of  underlying  parzuneter  <p.  Confidence  limits  obtziined  from  the  standard 
normal  approximation  for  the  signed  root  of  the  likelihood  ratio  statistic  have  this  property, 
since  R{ip)  is  invariant  under  repzurameterization.  Furthermore,  for  fixed  prior  density 
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7r(<^),  the  quantities  r(V»)  and  defined  at  (1)  are  invariant,  £uid  the  approximate 

posterior  quantiles  of  xp  derived  from  (2)  and  (3)  also  do  not  depend  on  choice  of  underlying 
parameter.  When  (2)  and  (3)  are  used  to  construct  approximate  confidence  limits,  with 
T(V’)  given  by  (1)  and  n{ip)  tahen  to  be  a  solution  to  the  Peers  and  Stein  differential 
equations,  some  arbitrariness  can  arise  because  the  solutions  to  these  equations  are  not 
unique  in  general.  However,  the  confidence  limits  obtained  from  using  different  solutions 
agree  at  least  to  order  Op(n~^),  that  is,  they  differ  by  at  most  Approximation 

(6)  to  T{xp),  requiring  an  orthogonal  parameterization,  is  not  invariant  since  it  depends  on 
the  orthogonad  parameterization  used.  This  approximation  is  independent  of  the  choice  of 
orthogonal  parameters  to  error  of  order  Op(n~^),  however,  and  when  it  is  used  in  (2)  and 
(3)  to  construct  approximate  confidence  limits,  the  resulting  limits  are  parameterization 
invariant  to  error  of  order  Op{n~^^'^).  Similar  comments  apply  to  approximation  (15).  This 
approximation  to  T(tlx)  depends  on  the  choice  of  parameterization  <j>  at  order  Op{n~^ ). 
2ind  it  can  be  used  in  (2)  and  (3)  to  produce  approximate  confidence  limits  that  are 
parameterization  dependent  only  at  order  Op{n~^^^). 

It  is  worthwhile  to  remark  that  the  quantities  r*(^)  (»  =  1, . . . ,  d)  have  another  inter¬ 
pretation.  A  straightforward  calculation  involving  Lagrange  multipliers  shows  that 

(..  =  1 . i). 


In  particular. 


.  (i 


and  thus,  for  values  of  rp  such  that  xp  —  xp  is  Op{n 
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3.  Procedures  based  on  conditional  profile  likelihood 


Suppose,  as  in  Section  2,  that  a  prior  density  7f(A)  is  considered  for  an  orthogonal 
reparameterization  A  =  A(<^).  Then  T(V>)  is  given  by  (7),  and  the  Laplace  approximation 
(2)  to  the  posterior  density  of  V’  can  be  written  as 

^  exp{/c(V')  -  (1- ) 

7r(A) 


where 


W)  =  W)  -  2 


•Q(^)det{Jo-(^)} 


-  log 


/c(V’)  is  maximized  at  and  c  is  a  normalizing  constant, 
approximation  of  DiCiccio  and  Martin  (1991)  to  (18)  yields 


~det{<^i(A)}  j 

.det{<?ii(A)}J  ’ 

Applying  a  tail  probability 


1  -  i^V-|x(V>)  c::  ^{Rc)  +  <^{Rc){R:^  -  Te),  1  -  F^,x(V>)  ^  -  R:^  log(HeTc)},  (19) 


where  Rc  =  Fc(V’)  =  sgn(^  -  V’)[2{/c(t^)  - 


To  =  r,(V')  = 


Hm) 

HH^))  iV’W 


and  =  d*/c(t/’)/dV’*»  {k  —  1,2).  Both  approximations  in  (19)  have  errors  of  order 

Op(n“^/^)  for  arguments  V’  such  that  ^  ^  is  Op{n~^^^).  It  follows  that  if  7r(A)  is  a 

solution  of  the  Peers  and  Stein  differential  equaticus,  then  the  solutions  of  the  equations 


^Rc)  +  <fi{Rc){R7^  -  Tc)  =  Q,  ^{Rc  -  R7'  log(HcTc)}  =  q,  (20) 

eire  approximate  upper  1  —  q  confidence  limits  having  coverage  error  of  order  0{n~^)  con¬ 
ditionally  as  well  as  unconditionadly.  If  the  prior  density  7f(A)  produces  posterior  quantiles 
for  t/>  that  are  approximate  confidence  limits  having  coverage  error  of  order  0{n~^^^)  or 
smaller,  then  the  confidence  limits  obtained  from  equations  (20)  have  coverage  error  of 
order 
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The  function  le{‘*l>)  is  identical  to  the  conditional  profile  likelihood  function  of  Cox  and 
Reid  (1987).  In  terms  of  an  orthogonal  pairameterization  A  with  log  likelihood  function 
/(A),  Cox  and  Reid  have  recommended  replacing  the  objective  function  /p(V’)  by 


-  2  iog 


(21) 


Ldet{-W(A)}J 

where  {— /a'6'(A)}  is  the  (d  —  1)  x  (d  —  1)  submatrix  of  {  — /a6(A)}  corresponding  to  the 
nuisance  parameters  A^, . . . ,  A‘*.  To  demonstrate  the  equality  of  Id'ip)  and  (21),  note  that 


det{-ra-6.(A)}  =  -P*(A)det{-r„6(A)}, 


(22) 


where  {— f“*(A)}  is  the  d  x  d  matrix  inverse  of  {— /oi(A)}.  Differentiation  of  the  identity 
/(A)  =  l{<f>{X)}  gives 

-UW  =  liiWKWKih  -‘‘'W  =  (a,i  =  1,. . .,dy. 


hence, 

det{-UX))  =  det{/y((*)}Idet(^i{A)}J2,  -P'(A)  =  QW-  (23) 


Combining  (22)  and  (23)  gives 


’det{— ro»6'(^)} 

1/2 

■Q(^)det{/,j(V’)}' 

‘''det{^i(A)) 

.det{-ra'6'(A)}. 

.C?(^)det{/i>(^)}. 

det{(^i(A)}’ 

and  it  follows  that  /c(V’)  (21)  are  the  same. 

Calculation  of  /c(V’)  requires  knowledge  of  the  orthogonal  pairaimeterization  A.  How¬ 
ever,  by  using  (14),  /c(V’)  can  be  approximated  by  a  function  that  does  not  explicitly 
involve  orthogonal  parameters.  For  values  of  V’  such  that  i/>  —  is  Op(n~^^^ ), 


W)  =  W)  - 1  log 


’Q{rp)det{Iij{tl;)}' 

.Q{ip)det{Iij{iJ>)}. 


s  4-  Op{n 


(24) 


and  the  integraJ  can  be  replaced  by  either  (^  —  ^)  tTT{^)  or  (0  —  xp)  trr(<^)  without 
changing  the  order  of  the  error  term.  Similaurly,  when  #(A)  is  a  prior  of  the  Peers  and  Stein 


type,  it  follows  from  (9)  that  Tc(V’)  cam  be  approximated  by 


(25) 
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where  ^  =  <^(^).  Since  ^  +  Op{n  ^),  —  —l^p^tp)  +  Op{l),  and  —lp^\ip)  = 

{Q{'^)}~^  1  the  computationally  simpler  approximation 


+  Op(n 


(26) 


also  holds.  The  solutions  to  equations  (20)  remain  upper  confidence  limits  with  conditional 
coverage  1  —  a  +  0{n~^)  when  either  approximation  (25)  or  (26)  is  used  for  Tdi')  and 
approximation  (24)  is  used  for  /c(V’)  in  calculating  tA,  Rc{^)  and  Tc(V>). 

The  function  hi^)  depends  on  the  choice  of  orthogonal  parameterization;  however, 
for  arguments  V'  such  that  %l>  —  ‘^is  Op(n~^/^),  the  values  of  /c(t/’)  obtained  under  different 
orthogonal  parameterizations  vary  by  at  most  Op{n~^ ).  The  same  property  holds  for  the 
function  Tcitp),  when  the  prior  density  is  completely  specified.  Approximate  confidence 
limits  obtained  by  solving  the  equations  in  (20)  depend  on  both  the  orthogonal  parame¬ 
terization  and  the  solution  to  the  Peers  and  Stein  equations  used.  However,  all  the  limits 
thus  derived  agree  up  to  order  Op(n“^);  that  is,  they  differ  from  one  another  by  terms 
of  order  Op{n~^f^).  Similar  comments  apply  for  approximations  (24),  (25)  and  (26).  Ap¬ 
proximation  (24)  to  /c(V’)  depends  on  the  imderlying  parameterization  <f>  at  order  Op{n~^ ) 
for  values  of  V’  such  that  —  is  Op(n“^/^).  When  (24)  is  used  in  (25)  or  (26)  to  approxi¬ 
mate  Tcii’),  the  resulting  approximation  is  also  parameterization  invariant  to  error  of  order 
Op(n~^).  Using  these  approximations  for  /c(V’)  and  Tc{tJ^)  in  (20)  produces  approximate 
confidence  limits  that  depend  on  the  parameterization  <!>  only  at  order  Op{n~^^^). 


4.  Examples 

The  methods  of  Section  2  and  3  are  now  illustrated  in  some  simple  situations  where  or¬ 
thogonal  parameterizations  are  readily  available,  so  that  comparisons  with  other  available 
procedures  are  possible. 

Consider  a  sample  Xi , . . . ,  Xn  from  a  <f-dimensional  exponential  family.  Assume  the 
log  likelihood  function  for  the  canonical  parameter  <i>  =  (<f>^ , . . . ,  ^‘*)  is  l((f>)  =  — 
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0{(f>)}-,  where  X  =  and  let  =  5/3(<?))/5c>' .  d,_;(o)  = 

d^l3{4>)/d(f)*d<f)^  (ij  =  l,...,d).  Then  E{X*)  =  0i{<f>)  {i  =  and  in  the  nota¬ 
tion  of  Section  2,  -lij{4>)  =  iij{4>)  =  ri^ii{<l>)  and  =  n“V'-'(<?>)  (^j  = 

where  is  the  d  x  d  matrix  inverse  of 

Exetmple  1.  Suppose  that  the  scalar  parameter  of  interest  is  i/-  =  o'.  Cox  and 
Reid  (1987)  have  noted  that  A  =  (A',...,A'^)  =  {4>^ ,  02i4>),  ■  ■  ■ ,  0d{<i>)}  is  an  orthog¬ 
onal  parameterization,  and  T{ip)  can  be  calculated  from  (7)  using  this  parameteriza¬ 
tion.  By  orthogonality  and  (9),  the  Peers  and  Stein  prior  densities  are  of  the  form 
7r(A)  oc  [/?“{?i(A)}]“^''^jf(A^, . . . ,  A*^).  If  the  function  ,  A*^)  is  chosen  to  be  con¬ 
stant,  then  7r(A)/7r(A)  =  Now  define  B{4>)  =  {/3,j{4>)]  and  = 

{0i'j'{4>)}  =  2,...,d),  so  that  is  the  (d  -  1)  x  (d  -  1)  submatrix  of  B{(f>) 

corresponding  to  the  nuisance  parameters  Since  det{A“((^)}  =  Bi(ci),  it  fol¬ 

lows  that  det{(;^j,(A)}/det{<?i)i(A)}  =  detBi(<^)/detBi(0).  Direct  calculation  shows  that 
r(^)  =  li(^)  =  n{0i(^)  -  det{/ij(^)}  =  n^detB{^),  and  Q{x!')  =  n~^3^\o). 


Hence,  (7)  gives 

det  B{^) 
det  B\{^) 

For  other  choices  of  ^(A'^,...  ,A‘^)  in  the  prior  density  ff(A),  (10)  ensures  that  (27)  is  an 
approximation  to  (7)  having  error  of  order  Op(n“^)  for  values  of  tp  such  that  tp  —  U’  is 

Approximation  (15)  to  T{xf^),  which  does  not  require  knowledge  of  orthogonal  param¬ 
eters,  coincides  with  (27),  because  approximation  (14)  is  exact  in  this  case.  The  identity 


W)  = 


1 


exp 


tr  r{<^(s)}ds 


det  B\[^) 
det  B\{^) 


12S) 


follows  from  expression  (17).  Since  iij,k(<P)  =  0ijJki<^)  ^  =  1,  •  •  •  ,d),  where  0,jk{o)  — 

0{(f>)fd4>'d<i>^d<f>'‘,  and  since  d^*{rp)/dtl>  =  (t  =  1, . . .  ,d),  (17)  yields 


trr{«^(i/.)}  =  -{0'^{^)  -  0^\^)0^H^)/0'\^)}l3„k{^)0’‘H^)/0''{o) 
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(29) 


=  -tr 

=  -^logdetBi{^(V»)}. 


For  this  choice  of  orthogonal  parameters,  the  conditional  profile  likelihood  function  is 


which  was  derived  by  Barndorff-Nielsen  and  Cox  (1979).  It  follows  from  (28)  that  approx¬ 
imation  (24)  to  /c(V’)  is  exact.  If  the  prior  density  is  7r(A)  cx  {(^(A)}]“^/^,  then 


Te(V') 


l/3”W/  A' 


for  other  prior  densities  of  the  Peers  and  Stein  type,  (30)  is  an  approximation  to  Tdv) 
having  error  of  order  Op(n~^ )  for  values  of  ^  such  that  is  Op(n”^ ).  Approximation 
(25)  to  Tc(V»)  agrees  with  (30),  and  approximation  (26)  is 

W)  = - - +0,(n-’).  (31) 

It  is  of  interest  in  the  present  context  to  compare  the  methods  from  Sections  2  and 
3  with  similiu:  procedures  that  have  appeared  in  the  literature.  For  this  example,  if  r(t;’) 
and  Tc(V’)  are  replaced  by 


—  V*)  V  detB(^)  J  (t^  “  d'){“(e 


in  equations  (4)  and  (20),  then  the  solutions  to  these  equations  are  approximate  upper 
confidence  limits  with  coverage  1  —  a  +  0(n~^/^).  This  use  of  U{‘4>)  and  Udtp)  has  been 
discussed  by  Barndorff-Nielsen  (1986),  Skovgaard  (1987),  Davison  (1988),  DiCiccio  and 
Martin  (1991),  Fraser  (1991),  Fraser,  Reid,  and  Wong  (1991),  and  Pierce  zind  Peters  (1992). 
For  values  of  xp  such  that  ip  —  tp  is  Op(n~*/^),  T(t/’)  and  Tdtp)  differ  from  U{xl;)  and 
Ud'^’)  by  terms  of  order  Op{n~^).  Note  that  U{tk)  is  obtained  from  T{tp)  by  replacing 
/i((^){f^^(<^)}^/2  =  the  eisymptotically  equivalent 


quantity  — and  that  J/d’/’)  is  obtained  from 

Tc{rp)  byreplacing  with  {■il’-i'){-l['^\v)}^^- 

in  (31). 

Example  2.  Suppose  the  parameter  of  interest  is  ip  =  E{X^)  =  ^i{p)-  An  orthogonal 
parameterization  is  A  =  (A^, . . . ,  A**)  =  {/3i(<^),  (/>•*},  and  for  a  specified  prior  density 

7f(A),  T{ip)  can  be  calculated  from  (7).  The  Peers  and  Stein  prior  densities  have  the 
form  7r(A)  oc  [/?ii{^(A)}]“^^^^(A^, . . . ,  A**),  and  if  the  function  ^(A^, . . . ,  A'^)  is  taken  to  be 
constant,  then  7r(A)/7r(A)  =  {0ii{4)/ It  is  easily  seen  that  detjA^  <?!>)}  =  Suio). 
whence  det{<^*  (A)}/ det{<?i;(A)}  =  and  that  r(0)  =  n{ip-tp} / /3ii(0),  where 

Ip  =  X^.  Therefore,  (7)  yields 


W)  = 


n{ip  -  rp){M^)y^^  L  W)det{/.di/>)}  J 


det{7jj(t^)} 


lV2 


(33) 


with  Iij(tP)  =  n{^i0)  +  {tP-  iP)0ujWIM^)}  and  Q{rP)  = 

Approximation  (15)  to  T{tp)  does  not  coincide  with  (33)  because  (14)  fails  to  Ije 
exact  in  this  case.  Approximation  (14)  can  be  verified  directly,  since  r‘(<^)  =  i>\l  ‘3\\(o) 
(i  =  l,...,d)  and  d^*{xp)/dtp  =  r'(^)  +  Op{n~^^^)  for  values  of  ip  such  that  v’  -  t'  is 
0p(n-^/2).  Thus, 


trr{^(V’)}  = 


= -^  log /9ii{^(t^)} +C)p(n  ^/^), 


which  yields 

/  trr{^(s)}ds  =  log  +Op(n"^), 

and  (14)  follows. 

For  this  case. 


/c(V’)  =  Ip(V’)  -  ^  log 


'  Q{ip)dei{Iij{ip)y 
.C?(t^)det{/,dt^)}. 
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(34) 


and 


w 


when  7r(A)  cx  [^n{<A('^)}]  Approximation  (24)  to  /c(t/’)  is 


W)  =  W)  -  2  log 


Q(V>)det{/,j(^)}1  r  $xn{j>{s)}  ^  , 

Q(^)det{/ij(^)}J  [0u{Hs)W  ’’ 


), 


and  approximation  (25)  to  Tc{rl^)  coincides  with  (34). 

Example  3.  Consider  a  sample  Xi,...,Xn  from  the  gamma  distribution  having  den¬ 
sity  f{x)  =  {{v / pY /T{v)}x‘'~^  exp{—{v/p)x},  x  <  0,  with  mean  p  and  shape  parameter 
u.  The  parameters  p  and  u  are  orthogonal.  Let  X  =  ^^Xi  and  X'  = 

Then  p  =  X,  and  i>  satisfies 


X*  d 

p{v)  =  log  p{v)  =  —  log  r(t/)  -  log  V. 


Suppose  the  mean  p  is  the  parameter  of  interest.  The  constrained  maximum  likelihood 
estimator  v{p)  is  given  by 

p{u)  =  log  ---I-  log—  -  —  -I-  1, 

p  p 

and  the  signed  root  of  the  likelihood  ratio  statistic  is 


d 


R(p)  =  sgn(A’  -  ^)[2n{C(t>)  -  C(«^)  =  logT{u)  -  logr(i/)  -  u. 


The  Peers  and  Stein  prior  densities  are  of  the  form  x{p,  u)  cx  g{i/)/p,  so 

rj,(  X _ P  9{i>)  f 

^  n^/'^iX  -  p)  V  <7(i>) 

where  p^^^{v)  =  dp{v)/du  =  dP  \ogT{v)ldu^  —  \jv. 

For  the  choice  g{y)  =  up^^\v), 

ni/2(X-^)i>i/2\p(»)(i>)J  ■ 


(35) 
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It  follows  from  Barndorff-Nielsen  (1986,  1987)  that  for  this  particular  T{^i)  the  solutions 
to  equations  (4)  have  coverage  error  of  order  Thus,  when  the  prior  density 

is  7r(/i,i/)  oc  up^^\u)l fi,  the  posterior  quantiles  of  p  are  approximate  confidence  linnt> 

having  coverage  1  —  a  +  ).  A  drawback  of  (35)  is  that  it  requires  calculation  of  the 

trigamma  function.  Note  that  the  choice  g{v)  =  produces 

which  is  computationally  simpler  than  (35).  Table  1  shows  simulated  coverage  probabilities 
for  approximate  confidence  limits  obtained  from  equations  (4)  with  T  given  by  (35)  and 
(36).  Although  (35)  yields  slightly  better  coverage  accuracy  than  (36),  both  versions  of  T 
perform  well. 

Apart  from  an  additive  constant,  /c(m)  is  given  by 

/c(/i)  =  nC{u)  -  i  log 


and  Tc{p)  is  given  by 


A £(£) 

9(i>) 


For  the  choice  g{u)  =  up^^^u),  Tc{p)  becomes 

Ti 

and  the  approximate  confidence  limits  obtmned  by  solving  equations  (20)  with  this  Tap 
have  coverage  error  of  order 

Now  suppose  the  shape  pareimeter  v  is  of  interest.  Then  p.{v)  =  A',  and 
R{u)  =  sgn(i>  -  u){2n[(^{i>)  -  (^{u)  -  i'{pii>)  -  p{i')]]Y'^ ■ 

The  Peers  and  Stein  prior  densities  are  7r(^,i/)  oc  g{p),  and 
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It  follows  from  (32)  that  if  T(u)  is  replaced  by 


U(  ^  ^ 

—  u)\v  J 


then  the  approximate  confidence  limits  obtained  from  equations  (4)  have  coverage  1  —  q  + 
0(n-®/2)  that  T(  v)  is  independent  of  the  choice  of  p(p),  since  =  fi.  Hence  it 

is  impossible  to  find  a  prior  density  of  the  Peers  and  Stein  type  so  that  T{v)  and  U{v) 
agree,  which  suggests  that  confidence  limits  with  coverage  error  of  order  cannot 


be  constructed  for  v  using  the  Bayesian  approach. 
For  this  cjise, 


Z,(i/)  =  n[C(«/)  +  i/{p(t>)-p(i/)}]  -ilogi/,  rc(i/)=  ^  ' 


Again,  it  follows  from  (32)  that  if  Tc{v)  is  replaced  by 


Uc{u)  = 


then  the  confidence  limits  obtained  from  equations  (20)  have  coverage  1  —  o  +  0(n 
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Table  1.  Simulated  coverage  probabilities  of  approximate  upper 
1  —  a  confidence  limits  for  gamma  mean  /i 

(^,1/)  =  (5,0.25) 


n 

=  5 

n 

o 

II 

1  —  Q 

0.01 

0.05 

0.95 

0.99 

0.01 

0.05 

0.95 

0.99 

A 

4.070 

12.706 

95.496 

98.802 

2.349 

9.070 

95.907 

99.106 

5(35) 

1.335 

5.502 

94.844 

98.843 

1  040 

5.061 

94.964 

98.964 

(36) 

1.434 

5.767 

94.686 

98.777 

1.080 

5.155 

94.888 

98.931 

(7(35) 

1.365 

5.669 

94.866 

98.869 

1.055 

5.102 

94.969 

98.967 

(36) 

1.480 

5.912 

94.707 

98.804 

1.091 

5.200 

94.893 

98.934 

=  (1,1) 


n 

=  5 

n 

=  10 

1  —  O' 

0.01 

0.05 

0.95 

0.99 

0.01 

0.05 

0.95 

0.99 

A 

3.351 

10.555 

93.560 

98.114 

1.976 

7.702 

94.811 

98.783 

5(35) 

1.378 

5.646 

94.549 

98.688 

1.055 

5.128 

94.737 

98.954 

(36) 

1.459 

5.903 

94.347 

98.616 

1.099 

5.260 

94.831 

98.912 

C(35) 

1.380 

5.716 

94.586 

98.738 

1.054 

5.141 

94.943 

98.963 

(36) 

1.457 

5.952 

94.385 

98.667 

1.098 

5.270 

94.836 

98.921 

(^,1/)  =  (0.5, 10) 


n 

=  5 

n 

=  10 

1  —  a 

0.01 

0.05 

0.95 

0.99 

0.01 

0.05 

0.95 

0.99 

A 

3.039 

9.391 

91.990 

97.508 

1.738 

6.890 

93.896 

98.488 

B(35) 

1.519 

5.878 

94.301 

98.528 

1.095 

5.154 

94.867 

98.900 

(36) 

1.592 

5.919 

94.264 

98.526 

1.103 

5.179 

94.844 

98.892 

C(35) 

1.541 

5.881 

94.330 

98.578 

1.081 

5.149 

94.877 

98.916 

(36) 

1.551 

5.924 

94.291 

98.567 

1.089 

5.173 

94.854 

98.907 

A,  B  and  C  refer  to  limits  obtained  by  solving  the  equations  $(i?)  =  o, 
$(iZ)  +  ip{R){R~^  —T)  =  a  and  ${ii  —  R~^  log(jRr)}  =  q,  respectively. 
Table  entries  are  percentages  based  on  1,000,000  trials. 
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